1 research outputs found
An Empirical Study on the Fairness of Pre-trained Word Embeddings
Pre-trained word embedding models are easily distributed and applied, as they alleviate
users from the effort to train models themselves.
With widely distributed models, it is important to ensure that they do not exhibit undesired behaviour, such as biases against population groups. For this purpose, we carry out
an empirical study on evaluating the bias of
15 publicly available, pre-trained word embeddings model based on three training algorithms
(GloVe, word2vec, and fastText) with
regard to four bias metrics (WEAT, SEMBIAS,
DIRECT BIAS, and ECT). The choice of word
embedding models and bias metrics is motivated by a literature survey over 37 publications
which quantified bias on pre-trained word embeddings. Our results indicate that fastText
is the least biased model (in 8 out of 12 cases)
and small vector lengths lead to a higher bias